Software for estimating life-cycle models
Hans-Martin von Gaudecker
Universität Bonn & IZA
Copenhagen, 13 June 2022
Ferrall’s (2020) Tale of Two Papers
- Thomas MaCurdy (1981). “An Empirical Model of Labor Supply in a Life-Cycle Setting,” Journal of Political Economy 89, 6, 1059-1085.
- Kenneth Wolpin (1984). “An Estimable Dynamic Stochastic Model of Fertility and Child Mortality,” Journal of Political Economy 92, 5, 852-874.
Ferrall’s (2020) Tale of Two Papers
- Both papers required a huge amount of customised code in the early 1980s
- MaCurdy (1981) approximates a structural model so that it has closed form; nowadays Panel IV
- Wolpin (1984) uses full-blown structural estimation of discrete choice model; almost requires same amount of coding today
Why is this a problem?
- Time to PhD / content thereof
- Fixed cost of entering the field later
- Software quality, code sharing culture
- Division of labor difficult (counterexample: develop treatment effect estimator and supply R package)
Σ Much slower progress than in other fields
Libraries for solving / estimating structural models
- QuantEcon
- Heterogeneous Agents Resources and toolKit (HARK)
- consav
- niqlow
- respy
Some observations
- Programmer / user distinction ?
- Exploitation of computational tricks ?
- How to make the most of different researcher profiles ?
A new attempt
- People
- Prior code
- Building blocks
- Discussion
OSE Group
- Janoś Gabler, Moritz Mendel, Tobias Raabe, Sebastian Gsell, Christian Zimpelmann, Annica Gehlen, Klara Röhrl, Tim Mensinger, Max Blesch
- Partnering with CS departments
- All open source, happy to involve others!
Prior code: Learning experiences
- Code for French, von Gaudecker & Jones — Solving life-cycle models on GPUs and some negative examples
- skillmodels — Speeding up code with Numpy, Numba, and Jax; automatic differentiation
- respy — Highly optimized solution of Eckstein-Keane-Wolpin models
- econ-project-templates — Sensible ways to structure research projects and pipelines
- GETTSIM, pytask — DAGs and introspection
- SID — Structural CoViD-19 infection model, real-world & real-time test of the general approach
OOP vs functional
- OOP: Hide state in objects grouping data and functions (methods)
- Functional: Eliminate state by relying on pure functions alone
- No side effects
- Only depend on inputs
- Mostly atomic operations + JIT compilers
- OOP: Overriding things via subclassing always ends up a mess in complex projects
- Functional + DAG/introspection: Replacing functions by others with compatible interfaces is very natural
Ecosystem for estimating structural life-cycle model
- Utilities used throughout: pybaum, dags
- Function optimisation, standard errors, sensitivity analysis — estimagic
- Depiction of Taxes & Transfers system — GETTSIM, OpenFisca
- Running tasks in a complex project — pytask
- Solution of dynamic programming problem — lcm
Some helpers
pybaum: (Fully) flexible specifications of parameters
- Modelled after Jax’ pytrees
- Very natural to express structure of parameters
dags: automate program execution
- introspect of function arguments
- build and execute DAG from that
pytask: dags on steroids for running pipeline
- files & functions as primitives
- allows mixing in R, Julia, Stata; compilation of LaTeX documents, …
pybaum
Specify your parameters as:
start_params = {
"preferences": {
"leisure_weight": 0.9,
"ces": 0.5
},
"work": {
"hourly_wage": 25,
"hours": 2_000
},
"time_budget": 24 * 7 * 365,
"consumption_floor": 3_000,
}
dags
def utility(consumption, leisure, params):
ɑ = params["preferences"]["leisure_weight"]
ɣ = params["preferences"]["ces"]
c = (1 - ɑ) ** (1 / ɣ) * consumption ** ((ɣ - 1) / ɣ)
l = ɑ ** (1 / ɣ) * leisure ** ((ɣ - 1) / ɣ)
return (c + l) ** (ɣ / (ɣ - 1))
def leisure(params):
return params["time_budget"] - params["work"]["hours"]
def income(params):
return params["work"]["hourly_wage"] * params["work"]["hours"]
def consumption(income, params):
c_min = params["consumption_floor"]
return income if income > c_min else c_min
def unrelated(working_hours):
raise NotImplementedError()
dags
model = dags.concatenate_functions(
functions=[utility, unrelated, leisure, consumption, income],
targets=["utility", "consumption"],
return_type="dict"
)
(aside: estimagic)
See notebook
LCM: Basic building blocks
- pybaum — Flexible structuring of parameters, data
- dags — has functions on individual states (all scalars)
- Not shown: Partial parameters that do not change in estimation right in beginning
- Dispatchers — Leverage Jax to vectorize scalar functions on almost arbitrary state-space
LCM: Why?
- Field matures; both models and solution methods become more and more complex:
- More and more difficult to be an expert in both
- More important to build on existing models instead of starting from scratch for each project
- HPC becomes more accessible through libraries
- More flexibility
- GPUs / TPUs better suited than large-scale MPI
LCM: Target users / developers
- Developers of new algorithms (sensible benchmark free lunch, available for downstream users)
- Frontier researchers
- Economists in policy analysis
Applied user
- Supplies economic primitives on per-state basis (i.e. everything scalar)
- Utility functions
- Constraints
- Descriptions of states and choice options
- State transitions based on states and choices (including filters that induce sparsity)
LCM
- Builds state space representation
- Builds derived economic functions (e.g. value function, derivatives, …) (still on scalars)
- Infers parameters (anything that is not state or choice), yields template
- Vectorises derived functions on state space
- Builds solution, simulation and likelihood functions
Expert user
- Supplies same things as applied user
- Thanks to dags implementation, may implement custom functions for anything
- State space representation
- Derived economic functions
- Vectorization operators directly
State space representation
- dense variables / Cartesian grid
- Memory efficient during calculation
- Memory hungry when storing value / policy functions
- Fast computations
- contingent variables / combined grid
- Memory efficient when storing value / policy functions
- Save computations
Abstracting from type of variables / grid
- Dispatcher / gridmap decorator provides abstraction during computation
- LCM value / policy function provides abstraction during lookup (WIP, but basically solved)
- No performance penalty
LCM: Roadmap
- Most building blocks are done & tested
- DC-EGM example with brute force should be working in September
- Implement DC-EGM by November (one continuous choice, arbitrary number of DC)
- Have frontier models running by Christmas
Discussion
- Suggestions for development?
- Interested in using (parts of) ecosystem?
- Interested in helping develop (parts of) ecosystem?